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Extending the Rule Space Mode 1 :*o a Semantically-Rich Domain: 
Diagnostic Ass^ : > it in Architecture 

Abstract 

This paper presents a technique for applying the Rule Space model of cognitive diagnosis 
(Tatsuoka, 1983) to assessment in a semanticaUy-rich domain. Responses to 22 architecture test 
items, developed to assess a range of architectural knowledge, were analyzed using Rule Space. 
Verbal protocol analyses guided the construction of a model of examinee performance, consisting 
of processes for constructing an initial representation of an item (labeled understands forming 
goals and performing actions based on those goals (solve) , and determining whether goals have 
been attempted and satisfied (check) . Item attributes, derived from these processes, formed the 
basis for diagnosis. Our technique extends Rule Space's applicability by defining attributes in 
terms of item characteristics and the causal relations between characteristics and the problem- 
solving model. 

Data were collected from 122 architects of various ability levels (students, architecture 
interns, and professional architects). Rule Space successfully classified approximately 65%, 90%, 
and 40% of examinees based, respectively, on attributes associated with the understand, solve, and 
check processes of the problem-solving model. The findings support the effectiveness of Rule 
Space in a complex domain and suggest directions for developing new architecture items by using 
attributes particularly effective at distinguishing among examinees of different ability levels. 



Index terms: diagnostic assessment; problems solving; architecture; rule space; item attributes; 
computer-based testing 
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Extending the Rule Space Model to a Complex Domain: 
Diagnostic Assessment in Architecture 

As testing programs begin to employ new forms of assessment, a common goal is to 
construct tests whose demands are closely related to tasks in the target domain (Wiggins, 1989). 
While recent research has presented several types of assessment tasks (e.g., simulation) that more 
accurately capture relevant knowledge and skills, there remains the issue of performance reporting: 
How can we provide examinees with information beyond scores of overall proficiency, 
information that captures the richness of knowledge and skills in a domain? In the current work, 
we employ the Rule Space Model (Tatsuoka, 1983) to generate descriptions of examinee ability 
that are far richer than those normally derived from large-scale assessment However, Rule Space 
has been most successfully applied in the past only to relatively narrow topics in well-defined 
domains (e.g., mixed number subtraction, single-variable isolation in algebra). This paper 
presents a technique for applying the Rule Space model of cognitive diagnosis (Tatsuoka, 1983) to 
a semantically-rich domain in need of more authentic, yet tractable, assessments: architecture. 

Architecture Assessment 

Current architecture assessments consist primarily of short, verbal multiple-choice 
questions or complex items that mimic the tasks architects normally encounter in the workplace. 
Because architecture is a complex domain, individuals' scores on relatively simple, verbal multiple- 
choice tests do not capture the complexity of the knowledge and skills to be assessed We address 
these issues by presenting examinees with figural response test items (Martinez, 1991; in press) 
and by generating diagnostic profiles of examinees based on their performance using the Rule 
Space model (Tatsuoka, 1983). 

The figural response items used in this study differ from standard multiple-choice items in 
that examinees must construct their answers and the responses consist of the generation or 
manipulation of figural material (e.g., graphs, pictures). Figural response items are especially 
suited to domains that are graphical or pictorial in nature; the domain of architecture is a natural 
candidate for this form of assessment. Hie approach of using figural response items for 
architecture assessment has a number of advantages. First, architecture is a graphical domain; 
designs are drawn, rather than essays being written. Thus, the figural response format provides a 
natural way for architects to express their ability. Second, constructed response items may be able 
to tap skills otherwise inaccessible using the multiple-choice format. Martinez & Katz (1992) 
showed, for example, that different skills are frequently tapped by figural response items compared 
with their multiple-choice counterparts. 

In this study, the figural response items were computer delivered; a sample item is shown 
in Figure 1. Each item consists of a stem (top of screen), a diagram, and a set of tools for drawing 
on or manipulating die diagram. The item in Figure 1 requires examinees to move the structures at 
the bottom of the screen (library, parking lot, and playground) on to the provided site, subject to 
the explicit constraints stated in the item stem as well as to the implicit constraints that architects 
associate with libraries, parking lots, and playgrounds (e.g., a playground should not be adjacent 
to a parking lot; a parking lot must have street access). 



Insert Figure 1 about here 



Architecture brings certain challenges to the practice of large scale assessment. First, much 
of architectural practice requires design, a notoriously complex cognitive skill. The duration of 
design projects in architecture are typically measured in days or months, not minutes as with the 
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usual examination item. Also, design tasks do not typically have "right" or "wrong" answers. 
Rather, a continuum of designs satisfy the constraints of the task to a greater or lesser extent 
Further, in the real world, constraints on a design task are not immutable; often the architect may 
relax certain initially specified constraints that he or she believes would allow for a better design 
(Goel & Pirolli, 1991). We do not seek to assess design skills directly. Although some of the 
figural response items present simple design tasks, most were meant to assess architectural 
knowledge through subsidiary tasks. For example, two items present a diagram of a building and 
ask the candidate to specify locations of seismic joints. While a corresponding task set for an 
architect might not be this simple, the task could come up as part of a larger design task in the real 
world. 

Architecture may be classified as a "semantically rich domain" (Simon, 1984) in that skilled 
performance involves extensive specialized knowledge. Architecture knowledge is usually gained 
over several years of intense study. This knowledge comes from a variety of disciplines, including 
civil engineering, physics, history, psychology, construction, and art. This forms a second 
challenge for architectural assessment Optimally, assessment will produce similarly rich 
descriptions of proficiency based on test performance. In the current work, we employ the Rule 
Space Model (Tatsuoka, 1983) to generate descriptions of examinee ability that are far richer than 
those normally derived from large-scale assessment. 

Our approach, like that of many emerging test theories, blends traditional psychometric 
approaches with developments in cognitive psychology (Gitomer & Yamamoto, 1991). Some new 
approaches including Rule Space build on item response theory (IRT), in which individuals and 
items are ordered along a proficiency continuum (Lord & Novick, 1969). One well-known 
shortcoming of IRT is that identical estimates of overall proficiency may be derived from radically 
different response patterns. If information about response patterns could be simplified and 
preserved, these rich descriptions of performance could be truly diagnostic (Mislevy, 1993). 

The Rule Space Model 

The Rule Space model provides descriptions of examinee performance that extend beyond 
raw scores or uni-dimensional IRT estimates of overall proficiency. Items are decomposed into 
attributes, which represent the latent traits that the items assess. Based on an examinee's pattern of 
correct and incorrect responses, the Rule Space model infers the most likely combination of 
attributes the examinee has mastered. 

The diagnosis of cognitive errors made by examinees is a pattern classification problem In 
this study, the patterns are item response vectors, and the vectors are ones and zeroes indicating 
correct and incorrect responses, respectively. The response vectors are classified as various 
correct latent knowledge states. The Rule Space model, developed to solve this classification 
problem, has three steps: (1) determination of classification groups, (2) formulation of a 
classification space, and (3) classification of examinees' responses. 

TVterminati nn of Classification Groups 

We assume that each postulated cognitive attribute— declarative knowledge, cognitive 
processes, solution strategies, and so forth— is tapped by at least one item in the pool. The 
relationship between these cognitive attributes and the items is expressed by an incidence matrix Q, 
whose order is the number of cognitive attributes k by the number of items n. If item j involves 
attribute k, then Qkj = "i. otherwise Qkj = 0. Each item is therefore characterized by the cognitive 
attributes required for its solution. 
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For example, suppose there are three items whose two underlying attributes are denoted A\ 
and A2- Further, suppose Aj is needed to solve items 1 and 3, and A2 is required in item 2. 
Then, the incidence matrix Q (2x3) is: 

Items 

AttributeAl 10 1 
AttributeA2 0 1 0 

With three items, there are eight possible response vectors: 

(0,0,0), (1,0,0), (0,1,0), (0,0,1), (1,1,0), (1,0,1), (0,1,1), (1,1,1). 

Given two attributes, there are four possible examinee knowledge states: 

State 1. Examinee cannot do Aj t but can do A2 
State 2. Examinee cannot do A2, but can do A\ 
State 3. Examinee cannot do Aj nor A2 
State 4. Examinee can do A j and A2 

There are four ideal response vectors conforming to the four states: 

State 1. (0,1,0) 
State 2. (1,0,1) 
State 3. (0,0,0) 
Stated (1,1,1) 

Note that each ideal response vector corresponds to a unique vector of mastered attributes. 
The remaining possible response vectors— (1,0,0), (0,0,1), (1,1,0), (0,1,1)— do not conform 
precisely to any of the models. The section entitled Classification of Examinees' Responses 
discusses Rule Space's treatment of such "non-ideal" response vectors. 

Tatsuoka (1991) and Varandi & Tatsuoka (1990) developed an algorithm to produce all 
possible ideal response patterns, corresponding to all possible latent knowledge states from an 
incidence matrix Q. The number of states is determined from the number of attributes, the number 
of items, and the degree of attribute nesting. In applying Rule Space to other data sets, the number 
of latent states has often exceeded 1000. 

The Classification Space 

In order to preserve continuity with current psychometric theories, the classification space 
was formulated as a two-dimensional Cartesian product space of the IRT proficiency parameter 9, 

and an index of the unusualness of an item response pattern £, where "unusualness" refers to the 
degree to which easier items are answered incorrectly and difficult items are answered correctly 
(Tatsuoka & Linn, 1981; Tatsuoka, 1984; 1985; 1990; Tatsuoka & Tatsuoka, 1987). When an 
examinee's response vector conforms well to the average performances on the test items, the 

absolute value of £ will be nearly zero. When ^-values of a knowledge state are close to zero, that 

is, close to the G-axis, we can expect that many examinees will be diagnosed to have that 

knowledge state. If the Rvalue associated with a knowledge state is large, positively or 
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negatively, then we expect that state to be unusual in the sense that few examinees will be 
diagnosed as having that knowledge state* 

Classification of Examinees' Responses 

Examinees 9 performances on test items ait not always consistent with their unobservable 
patterns of attribute mastery. Responses that deviate from an ideal response pattern are assumed to 
contain random errors or slips. Under the assumption that occurrences of slips on items are 
independent across items, Tatsuoka & Tatsuoka (1987) showed that the distribution of the number 
of slips follow a binomial distribution if the slippage probabilities are the same across the items, 
and follow a compound binomial distribution if the slippage probabilities differ across items. 

When the non-ideal response patterns associated with a particular ideal pattern, R, are 
mapped into the Rule Space (by computing their 6 and £ values), they form a unique subset that 
swarms aiound the point (Or, £r). The swarm of mapped points in the Rule Space follows 

approximately a multivariate normal distribution with a centroid of (Or, £r), and is called the laig 
distribution or state distribution associated with response pattern R (Tatsuoka, 1990)- When all 
possible ideal item response patterns are mapped on to the Rule Space, one can apply Bayes' 

decision rules for determining the minimum errors to classify an examinee's point (0 X , £ x ) into 
one of the possible latent states. More detailed discussions of the classification procedure can be 
found in Tatsuoka (1990), Tatsuoka & Tatsuoka (19S7, 1989), and Sheehan, Tatsuoka, & Lewis 
(1991). 

Applying Rule Space to Architecture Assessment 

The items used in this research were intended to assess a wide range of architectural 
knowledge and skills across several subdisciplines of architecture. Different items required 
different problem-solving operations. For example, some items required examinees to specify the 
properties of structural elements while others required the proper arrangement of architectural 
elements on the computer. The range of operations used across items implied that defining 
attributes in terms of low-level operations would produce an attribute set with little overlap across 
items. This would defeat the purpose of the Rule Space. We therefore analyzed the architecture 
items at a coarser grain, using attributes descriptive of higher-level processing as suggested by a 
general model of problem solving. This approach required a modification to the procedure used in 
other Rule Space analyses. We first defined a cognitive model that was general enough to account 
for problem-solving behavior on all items. Attribute definitions were then based on the model. In 
the next section, we describe the cognitive model and our procedure for defining item attributes. 

The Cognitive Model 

Our cognitive model was derived in part from a theory of computer interface use (Lewis & 
Poison, 1990). This model was chosen because of ostensible similarities between problem solving 
in user interface evaluation and solution of figural response items. Our adaptation of Lewis and 
Poison's model was based on verbal protocols from one pilot subject who solved all 22 
architecture items 1 . The analysis of protocols from a single subject was not used to produce a 
definitive cognitive model, but a hypothesized model which would guide us in developing 
reasonable attributes. The reasonableness of this hypothesized model could, in turn, be supported 
or falsified by our data. 



*This pilot subject was not part of the test administration discussed in the next section. 



10 



7 

Extending the Rule Space Model 



The model consists of processes relevant for constructing an initial representation of the 
item (ie. f understanding the problem stem and provided diagram), forming goals and performing 
actions based on those goals (i.e., solving the item), and determining whether goals have been 
satisfied and if they have been satisfied correctly (i.e., checking each problem solving step and the 
final answer). The model asserts that these processes exist, but makes no claims as to their order. 
For example, an examinee might come to a new understanding of a problem after attempting to 
solve it or after checking an initial, incorrect solution. The processes hypothesized by the model 
are summarized in Table 1. 



Insert Table 1 about here 



Understand The first step in solving any item is to understand what is being asked so that 
the appropriate knowledge can be invoked* Each figural response item consisted of both a verbal 
stem and a diagram, the latter of which may contain both graphical and verbal information. Thus, 
understand processes include: (a) reading and interpreting the verbal stem, (b) scanning and 
interpreting die diagram, and (c) relating the information in the stem and diagram to one's own 
knowledge. This processing allows the examinee to form initial goals, and either a plan for 
solving the item or a set of heuristics. An initial goal might be to apply a strategy learned in the 
classroom or to invoke a general problem-solving method such as means-ends analysis, in which 
one chooses at each step an action that will reduce the difference between the current state of the 
problem and the desired goal state. In specifying the understand processes— read stem, scan 
diagram, and relate to one's own knowledge — no claims are made as to either the ordering of the 
processes or the conditions under which they occur. Particular items will be less or more difficult 
in terms of, say, reading and interpreting the stem, and it is just these sorts of differences which 
form the basis for the item attribute definitions. 

Solve. Once an initial representation of the problem has been built, and the initial goals 
formed, the examinee must perform the actions that lead to solving the problem. Of course, while 
solving a problem, an examinee may reformulate or refine an initial representation of an item. The 
processes involved in solving an item are applied to each goal that has not yet been satisfied. Each 
of these goals may be elaborated by forming subgoals of the currently active goal or the examinee 
may perform an action that will satisfy the current goal. An action may be physical, such as 
drawing a line, or cognitive, such as finding a level area on a contour map. These two processes, 
elaboration of goals and performance of actions, do not determine precisely how a particular item is 
solved. Certain questions are left open. For example, which subgoals are formed when a 
particular goal is elaborated? How does the examinee decide on which actions to perform to satisfy 
a goal? Answering these questions requires a knowledge of the particular strategies used to solve 
each item. Whatever strategy an examinee uses (whether problem-specific or general), that 
strategy will determine which goals are attended to and in what order, and what subgoals are 
formed. 

Check . Once an action has been performed, the results of that action may be evaluated to 
ensure that the action was performed correctly and that it satisfies the original goal. If both 
conditions are met, the examinee may mark that goal as finished (perhaps by saying something to 
the effect of "Okay, that's done")t and proceed to the next unsatisfied goal. Thus, two types of 
evaluations may occur monitoring whether an action has been carried out as planned and noting 
whether it satisfies the original goal . 
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Attribute Creation 

Because the figural response items were designed to assess a wide range of architectural 
knowledge and skill, defining attributes in terms of the actual steps candidates take in solving the 
items (the approach used in previous applications of Rule Space) was contra-indicated Instead, 
we defined attributes in terms of item characteristics or features. Each item has multiple features 
and could be classified along several dimensions, but for purposes of attribute creation we 
identified those features with a potential causal connection to examinee performance. The attributes 
wete defined by identifying features of the items that could be expected either to help or hinder 
problem-solving. For example, we hypothesized that problem solving would be hindered during 
the process "scan the provided diagram/' if the diagram was a specialized graph (e.g., a 
topographic map) that would not be understood by all examinees. The 38 attributes identified in 
the task analysis are listed in Table 2. To illustrate the assignment of attributes to items, Table 3 
shows the attributes associated with the "library" item of Figure 1 along with an explanation of 
why that attribute was assigned. 

Each attribute is associated with one or more of the three types of processing (understand, 
solve, and check), and those assignments are shown in Table 4. The assignment of attributes to 
process was made by two independent judges with an inter-rater agreement of 88%. 
Disagreements were settled through discussion between the judges. Two independent judges also 
determined the subset of elementary cognitive attributes needed to solve each question. The inter- 
rater reliability for this process was again 88%. As before, disagreements were settled through 
discussion between the judges. 



Insert Tables 2, 3, and 4 Here 



Method 

Materials and Design 

Twenty-two figural response questions were constructed to draw upon skills needed 
throughout die broad content of an architectural licensing examination. These questions were 
developed for presentation on a computer with responses made through mouse movements and 
clicks. The questions were divided into two eleven-item subsets, and each subset was 
administered to a random half of the available subjects 2 . 

Subjects 

Subjects (N=122) were selected from three status groups: practicing architects (N=34), 
architecture interns (N=35), and architecture students (N=53). The eleven item responses 
provided by each subject were scored correct/incorrect and modeled with a two-parameter logistic 

IRT model Maximum likelihood estimates of proficiency (0) were subsequently obtained for each 
subject These estimates were used to classify subjects into three equal-sized proficiency groups. 
The cross-tabulation of status groups and proficiency groups is shown in Table 5. 



Subjects solved only eleven of the figural response items because they were also administered a set of 
complementary multiple-choice items. Time constraints did not permit additional testing. Contrasts between item 
sets are reported in another study (Martinez & Katz, 1992). 
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Insert Table 5 about here 



Procedure 

In groups of six, subjects were given a verbal introduction to the item delivery system. 
Following that, they each attempted the items individually on a computer. Of the 122 subjects, 
three subjects generated verbal protocols to gather independent support for the cognitive model 
To generate the protocols, the subjects were asked to "think aloud" (Ericsson & Simon, 1984), 
saying anything that they would normally "say" to themselves as they solved the items. 

Rule Space Analyses 

Rule Space analyses were conducted separately for each of the three groups of problem- 
solving attributes identified above. This strategy was chosen for two reasons. One very practical 
reason is that the combination of attributes made the possible number of knowledge states 
astronomical for the entire set of 38 attributes, thus the total pool of attributes had to be sub- 
divided. A second reason was to contrast attribute clusters in their ability to classify examinees. 

Rule Space was carried out in two steps: First, the BUGLIB computer program (Varandi 
& Tatsuoka, 1990) was used to determine the set of all possible latent knowledge states associated 
with the specified stage; second, the RULESPACE computer program (Tatsuoka, Bailie & 
Sheehan, 1990) was used to classify subjects into one of the knowledge states. Three attempts 
were made to classify each examinee, one for each of the three problem-solving process types 
(understand, solve, and checkV 

Results 

Verbal Protocol Results 

Our cognitive model postulated that certain processes would be used as a subject solved the 
architecture items. One way to gather evidence for the model is to show that these processes are 
sufficient for explaining the verbalizations made by subjects (Ericsson & Simon, 1984). Eight 
categories of subject verbalizations were defined, one category for each process in the cognitive 
model and a "miscellaneous" category. These categories were defined through examining 
verbalizations of the pilot subject as she solved eleven of the items. The sufficiency of the 
categories was established by attempting to categorize the verbalizations on the remaining eleven 
items. One rater categorized all of die subject's verbalizations, while another rater independently 
categorized a portion of the verbalizations. The inter-rater agreement on the portion scored by both 
raters was 82%. The final categories are shown in Table 6. The verbalizations encoded as 
miscellaneous include single words or short phrases ("Okay," "Let's see")* statements concerning 
the computer interface ("I have to click twice")* and statements irrelevant to the task. 



Insert Table 6 about here 



The categorization scheme was applied to the verbal reports of the three protocol subjects. 
The cognitive model accounted for 71% of the verbalizations made by subjects; the remaining 
verbalizations fell into the miscellaneous category. This result suggests that the model adequately 
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captured subjects* problem-solving performance, and thus supports the validity of the cognitive 
attributes created from this model 

Rule Space Results 

The projection of examinee response data into the two-dimensional Rule Space is presented 
in Figure 2. Examinees' 0 values are plotted along the x-axis; £ values are plotted along the y- 
axis. The symbols indicate status group membership. The plot shows that practicing architects are 
located mostly in the medium to high proficiency region and form a cluster that is distinct from the 
points plotted for interns and students. 



Insert Figure 2 about here 



Each examinee's performance was diagnosed three times, once for each of the understand, 
solve, and check attributes. For each diagnosis, the examinee's point in the rule space was 
compared to die points corresponding to die set of knowledge states associated with each attribute 
group. The itentfattribute incidence matrices developed for each problem-solving process type 
determined the number of possible states: 803 for understand , 1208 for solve, and 121 forfihgck. 
Within each process type, each knowledge state corresponded to a unique combination of mastered 
attributes and is represented by a unique point in the Rule Space. 

The classification results for each of the three types of problem-solving processes arc 
presented in Table 7. Within each process type, the number and percentage of classified examinees 
is broken down by IRT-proficiency level (low, medium, and high) and status group (student, 
intern, architect). Two patterns arc worth noting. The first is that the solve attributes arc the most 
powerful in classifying subjects across proficiency levels and status groups; in fact, all 41 low- 
proficiency examinees were classified. The next most powerful set of attributes is understand. 
followed by check . A second pattern is that, almost uniformly, examinees in the lower proficiency 
or status groups were more often classified than those in the higher groups. For example, twice 
the percentage of low-proficiency examinees (61%) than high-proficiency examinees (30%) were 
classified under check . 



Insert Table 7 about here 



The low classification rate achieved for the check processes is considered in Figure 3. In 
this plot, die diamonds stand for latent knowledge states and die boxes indicate die examinees' 
diagnostic location. The plot shows that the 121 knowledge states deduced from the check 
incidence matrix do not coincide with the examinees' points. Thus, the attributes defined from the 
check portion of the model do not capture examinee behavior, suggesting that examinee 
performance is not greatly differentiated by check processes (or that we need to rework that portion 
of the model). 



Insert Figure 3 here 
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Attribute Mastery Probabilities 

An attribute mastery vector was estimated for each classified examinee. These vectors are 
composed of zeros and ones, depending on whether the attribute in question was included in the 
subset of mastered attributes defined for the examinee's state. Attribute mastery patterns were 
averaged within proficiency and status groups, and analyzed using a repeated measures analysis of 
variance design, as described in Sheehan, Tatsuoka, and Lewis (1991). 3 

P-values for the analysis of variance F-tests are reported in Table 8. The table provides 
evidence for three clearly significant effects: proficiency group, attribute, and the attribute by 
proficiency group interaction. These results are reassuring because they indicate that the attributes 
associated with each problem-solving stage are differentially difficult and that examinees in 
different proficiency groups tend to have different attribute mastery profiles. 

The results obtained for the status group classification are not as clear-cut. Although the 
main effect of status group is clearly not significant, the interaction of status group with attribute is 
marginally significant This indicates that the average probability of mastery values calculated for 
some attributes differed among students, interns, and practicing architects, but these differences 
did not hold up after averaging over all attributes. Thus, on the average, examinees in different 
status groups did not differ in their mastery of the elementary cognitive skills identified in this 
study. 



Insert Table 8 about here 



Table 9 presents the mean probability of mastery values estimated for the solve attributes. 
The different attribute mastery profiles obtained for low, medium and high proficiency examinees 
are clearly indicated. The table also shows that attributes differ in discrimination. For example, 
consider the probabilities listed for the "environment" attribute: On average, low proficiency 
examinees mastered this attribute with a probability of .47; the corresponding probabilities for 
medium and high proficiency examinees are .60 and .97, respectively. The varying probabilities 
obtained for low, medium, and high proficiency examinees indicate that this attribute is highly 
discriminating. By contrast, the three mean values listed for the "learned procedure" attribute are 
all very similar. Thus, this attribute is not particularly helpful at discriminating among examinees 
of different ability levels. 



Insert Table 9 about here 



Discussion and Conclusions 

This study exemplifies how an IRT-based model for estimation of overall proficiency can 
be combined with the diagnostic classification of examinees. The results of the application of Rule 
Space were satisfying: We were able to classify a large proportion of examinees, especially those 
of low and medium ability. In principle, these classifications could be reported back to examinees 



3 A standard analysis of variance design would not have been appropriate for these data because the hypothesis of 
multisample sphericity— that is, independently observed attributes—is violated. The violation results from the fact 
that, instead of measuring a single attribute on each examinee, our design involves taking 38 attribute 
measurements. Thus, non-zero correlations are expected among the attribute measurements associated with a 
particular examinee. 
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so that remediation in weak areas could proceed* Traditional psychometrics has served well in 
discriminating among examinees for selection, placement, or classification on the basis of global 

estimates of proficiency. Rule Space provides estimates of 9, but also yields information that 
could serve the interests of the examinee in pin-pointing areas of non-mastery. Of course, 
applications of the technique described in this paper to other complex domains may require a much 
larger sample size than was used in the current study. Data from a relatively small number of 
examinees were sufficient for the goal of this paper, which was to demonstrate and explain a 
methodology for extending Rule Space. 

In addition to diagnosis and estimation of 6, Rule Space provides a framework for 
comparing a model of task performance to examinees* response data. There are few well-defined 
methodologies for comparing models to date (but see Polk & Newell, 1991), especially those that 
can accommodate a great variety of individual differences in examinees 9 knowledge, skill, and 
strategy. Model testing proceeds as follows: On the basis of a cognitive model, items are analyzed 
into their component cognitive attributes. The resulting item/attribute matrix (or matrices) leads to 

strong predictions about examinees' res^ :>nse patterns. If the (9, 0 position of an examinee's 
response pattern is close to that of an ideal response pattern, that examinee is classified into the 
knowledge state that the response pattern implies. To the extent that examinees' response patterns 
can be classified, the analysis provides support for the cognitive model. There are of course 
limitations to the Rule Space method. We have already noted that sets of attributes processed 
together are limited in size. As they approach 25 or so, the combinations of attribute profiles 
makes the possible number of ideal states unmanageable. Consequently, the attributes must be 
clustered and run separately as in this study. 

One contribution of this work is that we have outlined a methodology for applying Rule 
Space to complex domains. Generally, a limitation of Rule Space is that at the level of fine-grained 
analysis, the operations needed to solve items in a complex domain may not overlap a great deal. 
Many attributes might in fact be unique to particular items within the item set If this is the case, 
the cognitive attributes must be cast at a higher-level of generality such as item characteristics (e.g., 
type of diagram presented) or general problem-solving approach needed to solve each item (e.g., 
recalling a fact versus applying a learned procedure). Given more general attributes, what can we 
say about an examinee's performance? From a psychological viewpoint, the attributes tell us little 
about the examinee's cognitive competence. But from an educational standpoint, the attributes 
provide examinees with just the information they need to improve their performance on subsequent 
tests. The attributes allow us to say that an examinee has difficulties with items having certain 
properties- While we may have little information about the examinee's skill at a fine-grained level, 
the diagnostic reports (which attributes are mastered and which aren't) does tell the examinee what 
types of problems they should seek out and practice solving, and what components of problem 
solving need special attention. 

Attributes should be based on an independently constructed problem-solving model. 
Analysis of vert>al protocols, performed in this work, serves as one means for constructing and 
verifying a cognitive model. The model supports attribute creation by showing which aspects of 
the items would help or hinder problem-solving performance. In contrast to developing a list of 
attributes intuitively, a cognitive model provides a rich description of each attribute because the 
meaning of each attribute is derived from its place in the model. Methodologically, this rich 
attribute description promotes a fuller understanding of what each attribute means and facilitates the 
assigning of attributes to items. 

Another contribution of this work is that we were able to examine the power of attributes to 
discriminate among examinees of various levels. Knowing which attributes are highly 
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discriminating has value for the construction of items as well as for the design and sequencing of 
instruction* Differential relevance of attributes across proficiency groups also sheds light on die 
nature of expert/novice differences in the domain of interest Rule Space holds a great deal of 
value for satisfying die requirements of traditional psychometrics and for diagnosis of individual 
examinees. Through the use of such models, psychometrics has much to offer to learners and 
teachers beyond estimates of global proficiency. 
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Tabid 

Problem Solving Model 




Attribute Group 


Processes 


Understanding the Item 


Read the item stem 

Scan the diagram 

Recall relevant information 


Solving the Item 


Set subgoals 
Perform actions 


Checking Performance 


Is the action correct? 

Is the current goal completed? 
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Table 2 

Attribute Definitions 

Attribute Class Attribute Name Description Relations among 

Attributes in a Class 



Characteristics of Picture Presented figure is a sketch The three attributes in this 
Presented Figure of an actual object class are mumallx 

exclusive ( if an item has 
one attribute in this class, 
by definition is does not 
have another attribute from 
the same class) and 
exhaustive (all of the items 
may be classified as 
having at least one of the 
attributes in this class) 



Diagram Presented figure is an 

abstract diagram of an object 



Specialized Presented figure is a graph 
Diagram or chart - a visual 

representation of some 
information 



Clarity of General Diagram Based on just the presented Mutually exclusive, but 
Task obvious figure, its possible for not exhaustive 

someone to understand what 
task the item is asking them 
to perform* Details 
regarding the task included 
in the item stem might still 
be needed for correct 

performance of the task. 

Own obvious Based on the presented 

figure along with some prior 
knowledge, it's possible for 
someone to understand what 
task the item is asking them 
to perform. Details 
regarding the task included 
in the item stem might still 
be needed for correct 

performance of the task 
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Problem-solving 
requirements of item 



Declarative Requires knowing particular Mutually exclusive and 
architectural symbols and exhaustive 
definitions for correct 

solution. 



Learned Requires the application of 
Procedure fairly standard, algorithmic 
procedures that usually 
would have been learned 
previously. 



Discovered Requires the application of 
Strategy knowledge or procedures in 
a novel way. These items 
are more puzzle-like. 



Content area Site Design The item tests knowledge or 

skills associated with one of 
the recognized 
subdisciplines of 
architecture listed to the left. 



Mutually exclusive and 
exhaustive 



Structural 
Technology 
(General) 

Structural 
Technology 
(Lateral Forces) 

Materials and 
Methods 



Construction 
Documents 
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Particular Identify Street 
Architectural Features 


Correct problem solving 
requires that the candidate 
can recognize a street on a 
site plan* 


Neither mutually exclusive 
nor exhaustive 


i^nvTronrneni 


^oneci prouicm solving 
requires that the candidate 
knows about constraints due 
to environmental factors 
(e.g., weather, earthquakes) 




Contour Lines 


Requires the ability to read 
and interpret contour lines* 





Forces Requires the ability to 

recognize, interpret, and use 
force vectors. 



General Problem- 

cnlvinp Annroacti 


Read and 

Translate 


Problem solving goes 
thuouffh cvcle^ of ffetririff 

UUvU&U VjrwA\/tf V4 £wlUil£ 

information from die 
problem stem, using that 
infoimation to generate part 
of the answer, and then 
repeating* 


Mutually exclusive, but 
not exhaustive 




Indicate 
Location of 
New Feature 


Problem solving involves 
placing given elements into 
new positions or adding 
information to the provided 
diagram. 




Response Method 


Move/Rotate 


Requires arrangement of 
provided elements. 


Exhaustive, but not 
mutually exclusive 




Label 


Requires selecting which of 
a provided set of labels 
should be placed at various 
indicated points on the 
diagram* 






Draw Line 


Requires drawing of lines 
onto provided diagram. 






Draw Arrow 


Requires drawing of arrows 
onto provided diagram. 
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Misleading 
Characteristics 



Stem Incorrect 



Without detailed knowledge 
of an item type, the item's 
stem suggests an incorrect 
problem-solving method 



Mutually exclusive, but 
not exhaustive 



Diagram Without detailed knowledge 
Incorrect of an item type or diagram 
type, the item's provided 
diagram suggests incorrect 
problem-solving methods. 



Relation between 
Stem and Problem- 
solving 



Stem Tii « item stem provides 
Independent practically no information 
that could not be gained 
either through prior 
knowledge or through the 
provided figure. 



Stem independent and 
Stem dependent are 
mutually exclusive and 
exhaustive. 



Stem Dependent 



Problem-solving is 
necessarily based on 
information presented in the 
item stem. This category is 
the union of "Initial Info" 
and "Interim Info" 
categories. 



Initial 
Information in 
Stem 



While the stem information 
is necessary for correct 
solution, that information is 
not directly required during 
the course of problem 
solving. 



Initial info in stem and 
Interim info in stem are 
mutually exclusive and 
exhaustive across Stem 
dependent items. 



Interim 
Information in 
Stem 



The information in the stem 
is needed a number of times 
during the course of correct 
problem-solving. 
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Completion Criteria 


Own 
Knowledge 
Stop 


Examinees must use their 
own knowledge to decide 
whether they ait finished 
responding to an item (i.e., 
if the answer is complete). 
Neither the stem nor the 
diagram directly supply this 
information. 


Mutually exclusive and 
exhaustive 




Diagram Stop 


The provided diagram 
indicates whether an answer 
is complete. 






Diagram and 

Own 
Knowledge 

atop 


The provided diagram along 
with some specialized 
knowledge indicates 
whether an answer is 
complete. 






Stem Stop 


Information provided in the 
stem indicates whether a 
given answer is complete. 




Number of Correct 
Responses 


One Correct 


The item has only one 
correct answer. 


Mutually exclusive and 
exhaustive 




Few Correct 


The item has two or three 
correct answers, wmcn arc 
variants of one another. 






Many Correct 


The item has several correct 
answers, some of which 
may be qualitatively 
different from others and 
some of which may be 
variants on another answer. 
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Table 3 

Attributes Associated with 'library" Item (Figure \) 



Attribute 



Explanation 



Specialized Diagram 



The provided figure is a site plan, which is an abstract 
diagram of the actual building site. The site plan diagram 
contains elements that require specialized knowledge to 
interpret (e.g., contour lines, property lines, symbols for 
trees). 



Diagram Obvious 



Based on the provided elements and the operations 
available (move, etc.), it is clear that the general procedure 
for this task is to place the elements somewhere onto the 
site. 



Discovered Strategy 



There is no clear, algorithmic procedure for placing the 
buildings onto the site. The examinees must bring to bear 
knowledge learned in different situations to the solving of 
this task. 



Site Design 



This item presents a prototypical site design task. 



Identify Street 



Recognizing the street on the site plan is important for 
correct placement of the parking lot 



Contour Lines 



Correctly interpreting the site plan's contour lines is 
necessary for correct placement of the buildings oti the site 
(e.g., the buildings should not be placed on the steep 
slope, but on relatively level ground). 



Stem Independent 



Beyond the general task and the standard "preserve all 
trees," the stem does not provide any information that is 
vital to the correct solution of the item. 



Many Correct 



There are a number of correct solutions to this item, 
reflecting different arrangements of the buildings on the 
site. 



Move/Rotate 



The primary interface operation in this task is moving 
elements and rotating them to fit better onto the site. 



Own Stop 



Based on their own knowledge, it is up to the examinees to 
determine when they are finished responding to the item. 
Nothing in the stem nor in the diagram provides feedback 
either on the correctness or completeness of a response. 
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Table 4 . m 

ftttrihntc Assi gnments to Processing Types 



Attribute Class 


Attribute 


Problem-Solving Process Type 




Understand 


Solve Check 


Characteristics of 
presented figure 


Picture 
Diagram 

Specialized Diagram 


X 

X 
X 




Clarity of general task 


Diagram Obvious 
Own Obvious 


X 
X 


X 
X 


Problem-solving 
requirements of item 


Learned Algorithm 

Declarative 
Discovered Strategy 


X 

X 
X 


X 

X 
X 


Content area 


Site Design 
Structural Technology 
Structural Tech. (Lateral Forces) 
Materials and Methods 
Construction Documents 


X 
X 
X 
X 
X 




Particular architectural 
features 


Identify Street 

Environment 
Contour Lines 
Forces 


X 

X 
X 
X 


X 

X 
X 
X 


Relation between stem 
and problem-solving 


Stem Independent 

Stem Dependent 
Initial Info in Stem 


X 

X 
X 





Interim Info, in Stem 



X 



Number of correct 
responses 



One Correct 



Few Correct 
Many Correct 
eaa and Translate 



X 
X 



X 

X 
X 



General problem-solving 
approach 



Indicate Location of New Feature 



X 



Response method 



Move/Rotate 
Label 
Draw Line 
Draw Arrow 



X 
X 
X 



Completion Criteria 



Own Stop 
Diagram Stop 
Stem Stop 

Diagram + Own Stop 



X 
X 
X 
X 

IT 



Misleading 
Characteristics 



Stem Incorrect 
Diapam Incorrect 



X 
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Tables 

Distribution of Status Gumps bv Proficiency 



Proficiency 


N 






Status Group 








Student 


Intern 


Architect 






N 


Column % 


N 


Column % 


N Column % 


Low 


41 


27 


51 


10 


29 


4 12 


Medium 


41 


17 


32 


12 


34 


12 35 


High 


40 


9 


17 


13 


37 


18 53 


Total 


122 


53 


100 


35 


100 


34 100 
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Table 6 

Protocol Encoding Categories 



Understand 

Read stem : statements involving the reading of the problem stem. Read statements include 
verbatim readings of the stem as well as partial reading of the problem stem. 

Scan diagram : statements involving the provided diagram. Diagram statements include verbatim 
readings of verbal information as well as verbal descriptions of information in the diagram (e.g., 
"lateral forces coming that way"). 

Relate : statements regarding how the problem or parts of the problem relate to the examinee's own 
knowledge. Relate statements consist of several types of verbalizations including verbalizations 
regarding: 

- an expectation or the violation of an expectation (e.g., "Normally there would be more 
lines on this window drawing") 

- recognition of the problem (or part of the problem) as of a particular type (e.g., 'This is a 
site vignette," "this is a perspective drawing") 

• predictions as to the difficulty of the problem (e.g., "this will take a while") 

• the definitions or ambiguity of sections of the problem (e.g., "is it an awning or a 
hopper?", "most sheathing I know of is...") 

Solve 

Goal : stating an intent or future action. Goal statements are often stated in the future tense or in 
terms of "should be." 

Perform : statements regarding the performance of an action. Perform statements are usually stated 
in the present or "continuing" tense (e.g., "that dips here")* Perform statements relate only to 
physical actions such as moving a block on the screen or locating a particular item in the diagram 
(for the latter, e.g., "this is a flat area"). It may be difficult to distinguish between goal and perform 
statements. 

Check 

Evaluate-correct : statements regarding the correctness of a performed action or the result of that 
action (e.g., the location of a placed object). Evaluate-correct statements should only refer to the 
examinee's own actions or answers, not to the problem itself. These statements may either reflect 
judging the correctness of an action (e.g., "is that right?") or reflect the outcomes of a judgment 
(e.g., "that isn't what I wanted to do"). 

Evaluate-comp lete: statements suggesting that some action or goal has been completed. As with 
evaluate-correct statements, evaluate-complete statements include verbalizations judging if 
something has been finished (e.g., "is there anything else to be done?") as well as verbalizations 
concerning the results of such judgments (e.g., "that's it " "that was easy"). 
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Table 7 

Classification Results for Subjects Grouped bv Proficiency and Status 
Group N Problem-Solving Process Type 



Understand Solve Check 

No. Classified % No. Classified % No. Classified % 



Proficiency 












25 


61 


Low 


41 


32 


78 


41 


100 


Medium 


41 


29 


71 


40 


98 


13 


32 


High 


40 


20 


50 


33 


83 


12 


30 


Status 
















Student 


53 


37 


70 


52 


98 


25 


47 


Intern 


35 


23 


66 


32 


91 


11 


31 


Architect 


34 


21 


62 


30 


88 


14 


41 


Total 


122 


81 


66 


114 


93 


50 


41 



32 



Table 8 

Analysis of Variance Results for Attribute Mastery Data 



Group Problem-Solving Process Type 





Understand 

V/ J&WVJL 0UU1U 


Solve 


Check 




p-value 


p-value 


p-value 


Between Subjects 








Proficiency 


.0001 


.0001 


.0001 


Status 


.5621 


.1948 


.3433 


Proficiency x Status 


.1343 


.4707 


.7231 


Within Subjects' 1 








Attribute 


.0001 


.0001 


.0001 


Attribute x Proficiency 


.0013 


.0001 


.0130 


Attribute x Status 


.0885 


.0874 


.4287 


Attr.x Prof, x Status 


.4743 


.0535 


.1029 



p- values for within-subject effects were calculated using Wilks' Lambda. 
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Table 9 

Attribute Mastery Probabilities for Solve 



Attribute 




Proficiency 




L/vcraii Mean 




Low 


Medium 


High 




Many correct 


•35 


.28 


.35 


1 A 

.34 


Draw Arrow 


A A 

•44 


O O 

.38 


.33 


.38 


Move/Rotate 


.31 


.39 


.61 


A A 

•44 


Label 


A 1 

.43 


.66 


•88 


• OO 


Environment 


.47 


.60 


•V7 


• OO 


Contour Lines 


.61 


.78 


•oi 


*7A 

.74 


Forces 


. /U 


.04 


• OO 


• /4 


Identify otreet 




. /o 


• 54 


• to 


Interim into 


.0/ 




•VZ 


• OU 


Diagram Obvious 


.70 


.76 


.94 


.80 


Own Obvious 


.81 


.83 


.86 


.83 


Few Correct 


.66 


.91 


.98 


.85 


Discovered Strategy 


TO 


on 
.oy 


no 
.yo 


on 

.oy 


Ind. Location 


.75 


1.00 


1.00 


.92 


Read + Translate 


.86 


.98 


.98 


.94 


Declarative 


.87 


.97 


1.00 
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Figure 1. Sample figural response item. 
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